AITopics | bias 2

Collaborating Authors

bias 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9adc8ada9183f4b9a007a02773fd8114-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 02:31:26 GMT

artificial intelligence, excess risk, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Boosting methods for interval-censored data with regression and classification

Bian, Yuan, Yi, Grace Y., He, Wenqing

arXiv.org Machine LearningJan-27-2026

Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event studies where exact event times are unobserved but fall within known intervals. Effective handling of such data is crucial in fields like medical research, reliability engineering, and social sciences. In this work, we introduce novel nonparametric boosting methods for regression and classification tasks with interval-censored data. Our approaches leverage censoring unbiased transformations to adjust loss functions and impute transformed responses while maintaining model accuracy. Implemented via functional gradient descent, these methods ensure scalability and adaptability. We rigorously establish their theoretical properties, including optimality and mean squared error trade-offs. Our proposed methods not only offer a robust framework for enhancing predictive accuracy in domains where interval-censored data are common but also complement existing work, expanding the applicability of existing boosting techniques. Empirical studies demonstrate robust performance across various finite-sample scenarios, highlighting the practical utility of our approaches.

artificial intelligence, conference paper, machine learning, (19 more...)

arXiv.org Machine Learning

2601.17973

Country: North America (0.28)

Genre: Research Report > New Finding (0.92)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Add feedback

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Neural Information Processing SystemsOct-9-2025, 02:29:08 GMT

The widely observed'benign overfitting phenomenon' in the neural network literature raises the challenge to the'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent

artificial intelligence, machine learning, nullt 1 2, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression

Zuo, Yifei, Yin, Yutong, Zeng, Zhichen, Li, Ang, Zhu, Banghua, Wang, Zhaoran

arXiv.org Artificial IntelligenceOct-3-2025

Transformer architectures have achieved remarkable success in various domains. While efficient alternatives to Softmax Attention have been widely studied, the search for more expressive mechanisms grounded in theoretical insight-even at greater computational cost-has been relatively underexplored. In this work, we bridge this gap by proposing Local Linear Attention (LLA), a novel attention mechanism derived from nonparametric statistics through the lens of test-time regression. First, we show that LLA offers theoretical advantages over Linear and Softmax Attention for associative memory via a bias-variance trade-off analysis. Next, we address its computational challenges and propose two memory-efficient primitives to tackle the $Θ(n^2 d)$ and $Θ(n d^2)$ complexity. We then introduce FlashLLA, a hardware-efficient, blockwise algorithm that enables scalable and parallel computation on modern accelerators. In addition, we implement and profile a customized inference kernel that significantly reduces memory overheads. Finally, we empirically validate the advantages and limitations of LLA on test-time regression, in-context regression, associative recall and state tracking tasks. Experiment results demonstrate that LLA effectively adapts to non-stationarity, outperforming strong baselines in test-time training and in-context learning, and exhibiting promising evidence for its scalability and applicability in large-scale models. Code is available at https://github.com/Yifei-Zuo/Flash-LLA.

artificial intelligence, machine learning, softmax attention, (18 more...)

arXiv.org Artificial Intelligence

2510.0145

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Effects of Distributional Biases on Gradient-Based Causal Discovery in the Bivariate Categorical Case

Schwabe, Tim, Lange, Moritz, Wiskott, Laurenz, Acosta, Maribel

arXiv.org Machine LearningSep-3-2025

Gradient-based causal discovery shows great potential for deducing causal structure from data in an efficient and scalable way. Those approaches however can be susceptible to distributional biases in the data they are trained on. We identify two such biases: Marginal Distribution Asymmetry, where differences in entropy skew causal learning toward certain factorizations, and Marginal Distribution Shift Asymmetry, where repeated interventions cause faster shifts in some variables than in others. For the bivariate categorical setup with Dirichlet priors, we illustrate how these biases can occur even in controlled synthetic data. To examine their impact on gradient-based methods, we employ two simple models that derive causal factorizations by learning marginal or conditional data distributions - a common strategy in gradient-based causal discovery. We demonstrate how these models can be susceptible to both biases. We additionally show how the biases can be controlled. An empirical evaluation of two related, existing approaches indicates that eliminating competition between possible causal factorizations can make models robust to the presented biases.

artificial intelligence, intervention, machine learning, (14 more...)

arXiv.org Machine Learning

2509.01621

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(10 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

When do Random Forests work?

Revelas, C., Boldea, O., Werker, B. J. M.

arXiv.org Machine LearningApr-17-2025

We study the effectiveness of randomizing split-directions in random forests. Prior literature has shown that, on the one hand, randomization can reduce variance through decorrelation, and, on the other hand, randomization regularizes and works in low signal-to-noise ratio (SNR) environments. First, we bring together and revisit decorrelation and regularization by presenting a systematic analysis of out-of-sample mean-squared error (MSE) for different SNR scenarios based on commonly-used data-generating processes. We find that variance reduction tends to increase with the SNR and forests outperform bagging when the SNR is low because, in low SNR cases, variance dominates bias for both methods. Second, we show that the effectiveness of randomization is a question that goes beyond the SNR. We present a simulation study with fixed and moderate SNR, in which we examine the effectiveness of randomization for other data characteristics. In particular, we find that (i) randomization can increase bias in the presence of fat tails in the distribution of covariates; (ii) in the presence of irrelevant covariates randomization is ineffective because bias dominates variance; and (iii) when covariates are mutually correlated randomization tends to be effective because variance dominates bias. Beyond randomization, we find that, for both bagging and random forests, bias can be significantly reduced in the presence of correlated covariates. This last finding goes beyond the prevailing view that averaging mostly works by variance reduction. Given that in practice covariates are often correlated, our findings on correlated covariates could open the way for a better understanding of why random forests work well in many applications.

artificial intelligence, covariate, machine learning, (18 more...)

arXiv.org Machine Learning

2504.1286

Country: Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

Faw, Matthew, Caramanis, Constantine, Hoffmann, Jessica

arXiv.org Machine LearningMar-7-2025

Unconscious bias has been shown to influence how we assess our peers, with consequences for hiring, promotions and admissions. In this work, we focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us, despite no deliberate intention of favoritism. In a world where the people hired today become part of the hiring committee of tomorrow, we are particularly interested in understanding (and mitigating) how affinity bias affects this feedback loop. This problem has two distinctive features: 1) we only observe the biased value of a candidate, but we want to optimize with respect to their real value 2) the bias towards a candidate with a specific set of traits depends on the fraction of people in the hiring committee with the same set of traits. We introduce a new bandits variant that exhibits those two features, which we call affinity bandits. Unsurprisingly, classical algorithms such as UCB often fail to identify the best arm in this setting. We prove a new instance-dependent regret lower bound, which is larger than that in the standard bandit setting by a multiplicative function of $K$. Since we treat rewards that are time-varying and dependent on the policy's past actions, deriving this lower bound requires developing proof techniques beyond the standard bandit techniques. Finally, we design an elimination-style algorithm which nearly matches this regret, despite never observing the real rewards.

algorithm, bias 0, log null 12, (16 more...)

arXiv.org Machine Learning

2503.05662

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect

Khan, Mohammad Kaviul Anam, Saarela, Olli, Kustra, Rafal

arXiv.org Machine LearningJan-28-2025

Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by \citet{breiman2001random} and \citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from \citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).

artificial intelligence, machine learning, predictor, (15 more...)

arXiv.org Machine Learning

2501.16988

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

No Free Lunch From Random Feature Ensembles

Ruben, Benjamin S., Tong, William L., Chaudhry, Hamza Tahir, Pehlevan, Cengiz

arXiv.org Machine LearningDec-6-2024

Given a budget on total model size, one must decide whether to train a single, large neural network or to combine the predictions of many smaller networks. We study this trade-off for ensembles of random-feature ridge regression models. We prove that when a fixed number of trainable parameters are partitioned among K independently trained models, K = 1 achieves optimal performance, provided the ridge parameter is optimally tuned. We then derive scaling laws which describe how the test risk of an ensemble of regression models decays with its total size. We identify conditions on the kernel and task eigenstructure under which ensembles can achieve near-optimal scaling laws. Training ensembles of deep convolutional neural networks on CIFAR-10 and a transformer architecture on C4, we find that a single large network outperforms any ensemble of networks with the same total number of parameters, provided the weight decay and feature-learning strength are tuned to their optimal values. Ensembling methods are a well-established tool in machine learning for reducing the variance of learned predictors. While traditional ensemble approaches like random forests (Breiman, 2001) and XGBoost (Chen & Guestrin, 2016) combine many weak predictors, the advent of deep neural networks has shifted the state of the art toward training a single large predictor (LeCun et al., 2015). However, deep neural networks still suffer from various sources of variance, such as finite datasets and random initialization (Atanasov et al., 2022; Adlam & Pennington, 2020; Lin & Dobriban, 2021; Atanasov et al., 2024).

artificial intelligence, ensemble, machine learning, (18 more...)

arXiv.org Machine Learning

2412.05418

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis

Tan, Yan Shuo, Ronen, Omer, Saarinen, Theo, Yu, Bin

arXiv.org Machine LearningJun-28-2024

Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by theoretical guarantees that its posterior distribution concentrates around the true regression function at optimal rates under various data generative settings and for appropriate prior choices. In this paper, we show that the BART sampler often converges slowly, confirming empirical observations by other researchers. Assuming discrete covariates, we show that, while the BART posterior concentrates on a set comprising all optimal tree structures (smallest bias and complexity), the Markov chain's hitting time for this set increases with $n$ (training sample size), under several common data generative settings. As $n$ increases, the approximate BART posterior thus becomes increasingly different from the exact posterior (for the same number of MCMC samples), contrasting with earlier concentration results on the exact posterior. This contrast is highlighted by our simulations showing worsening frequentist undercoverage for approximate posterior intervals and a growing ratio between the MSE of the approximate posterior and that obtainable by artificially improving convergence via averaging multiple sampler chains. Finally, based on our theoretical insights, possibilities are discussed to improve the BART sampler convergence performance.

bias 2, denote, dim 1, (16 more...)

arXiv.org Machine Learning

2406.19958

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > Tasmania (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback